Narrow Width Dynamic Scheduling
نویسندگان
چکیده
To satisfy the demand for higher performance, modern processors are designed with a high degree of speculation. While speculation enhances performance, it burns power unnecessarily. The cache, store queue, and load queue are accessed associatively before a matching entry is determined. A significant amount of power is wasted to search entries that are not picked. Modern processors speculatively schedule instructions before operand values are computed, since cycle-time demands preclude inclusion of a full ALU and bypass network delay in the instruction scheduling loop. Hence, the latency of load instructions must be predicted since it cannot be determined within the scheduling pipeline. Whenever mispredictions occur due to an unanticipated cache miss, a significant amount of power is wasted by incorrectly issued dependent instructions. This paper exploits the prevalence of narrow operand values by placing fast, narrow ALUs, cache, and datapath within the scheduling loop. The results of this narrow datapath are used to avoid unnecessary activity in the rest of the execution core by creating opportunities to use different energy reduction techniques. A novel approach for transforming the data cache, store queue, and load queue from associative (or set-associative) to direct mapped saves a significant amount of energy. Additionally, virtually all load latency mispredictions can be accurately anticipated with this narrow datapath, and very little energy is wasted on executing incorrectly scheduled instructions. Our narrow datapath design, coupled with a novel partitioned store queue and pipelined data cache, can achieve cycle time comparable to those of conventional approaches, while dramatically reducing misspeculation. This technique saves approximately 27% of the dynamic energy of the out-of-order core, which translates into roughly 11% of total processor dynamic energy, without any loss of performance for integer benchmarks. Finally, a less-complex flush-based recovery scheme is shown to suffice for high performance due to the rarity of load misscheduling.
منابع مشابه
The complexity of scheduling graphs of bounded width subject to non-zero communication delays
In this report, we study the complexity of scheduling problems for precedence graphs of bounded width. For such graphs, the size of a maximum anti-chain is bounded by a constant. It is shown that for graphs of bounded width with unit-length tasks and unit communication delays, a minimum-length schedule on m processors can be constructed in polynomial time using a dynamic-programming algorithm. ...
متن کاملDynamic Scheduling with Narrow Operand Values
Tomasulo’s algorithm creates a dynamic execution order that extracts a high degree of instruction-level parallelism from a sequential program. Modern processors create this schedule early in the pipeline, before operand values have been computed, since present-day cycle-time demands preclude inclusion of a full ALU and bypass network delay in the instruction scheduling loop. Hence, modern sched...
متن کاملHigh-field dynamic nuclear polarization with high-spin transition metal ions.
We report the dynamic nuclear polarization of (1)H spins in magic-angle-spinning spectra recorded at 5 T and 84 K via the solid effect using Mn(2+) and Gd(3+) complexes as polarizing agents. We show that the magnitude of the enhancements can be directly related to the effective line width of the central (m(S) = -1/2 → +1/2) EPR transition. Using a Gd(3+) complex with a narrow central transition...
متن کاملBroadband optical delay with large dynamic range
The use of atomic media to produce optical delay has predominantly exploited the steep dispersion associated with electromagnetically induced transparency (EIT) [1]. While this can lead to very low group velocities it has a severe bandwidth limitation owing to the narrow spectral range over which the transparency and steep dispersion occur, making delays longer than the pulse width difficult to...
متن کاملOptimizing the Static and Dynamic Scheduling problem of Automated Guided Vehicles in Container Terminals
The Minimum Cost Flow (MCF) problem is a well-known problem in the area of network optimisation. To tackle this problem, Network Simplex Algorithm (NSA) is the fastest solution method. NSA has three extensions, namely Network Simplex plus Algorithm (NSA+), Dynamic Network Simplex Algorithm (DNSA) and Dynamic Network Simplex plus Algorithm (DNSA+). The objectives of the research reported in this...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Instruction-Level Parallelism
دوره 9 شماره
صفحات -
تاریخ انتشار 2007